Inter-document Similarity in Web Searches
نویسندگان
چکیده
Existing Web search services fail in helping users with information needs that are broad, vague, or hard to express through a set of keywords. This dissertation investigates the use of retrieval techniques based on inter-document similarity, either measured through the textual contents or the linkage between documents. Unlike traditional retrieval approaches, which match documents against keywords and produce one-dimensional ranked lists of results, techniques based on inter-document similarity offer better support for results visualization, as well as alternative ways of expressing information needs. A Portuguese Web search engine has been extended with two inter-document similarity algorithms: result set clustering and related pages. The system was evaluated in a user survey, which has shown that both algorithms are well accepted. KEY-WORDS: Web Information Retrieval, Clustering, Similarity Search, WebMining.
منابع مشابه
An Ensemble Click Model for Web Document Ranking
Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...
متن کاملنقش ارتباطات معنایی در بهبود نتایج یک سیستم پیشنهاد استناد- مقاله برگزیده هفدهمین کنفرانس ملی انجمن کامپیوتر ایران
With the increasingly growth of scientific documents in the Web, it is difficult to select a concerned document. A citation recommendation system receives a text and recommends documents to be cited by the text. Such recommendation helps a researcher in hitting his/her concerned texts. Based on sematic relations, this paper presents a new indicator to measure the similarity between documents an...
متن کاملClustering multilingual documents by estimating text - to - text semantic relatedness
This thesis is about multilingual document clustering through estimating semantic relatedness between multilingual texts. Specifically we focus on the task of clustering multilingual documents with very limited or no supervisory information. We present two approaches to address the problem : a comparable-corpora based approach and a web-searches based approach. Our first approach derives pairwi...
متن کاملEffective Hybrid Recommendation Combining Users-Searches Correlations Using Tensors
Most recommendation methods employ item-item similarity measures or use ratings data to generate recommendations. These methods use traditional two dimensional models to find inter relationships between alike users and products. This paper proposes a novel recommendation method using the multi-dimensional model, tensor, to group similar users based on common search behaviour, and then finding a...
متن کاملWeb Document Classification based on Hyperlinks and Document Semantics
Besides the basic content, a web document also contains a set of hyperlinks pointing to other related documents. Hyperlinks in a document provide much information about its relation with other web documents. By analyzing hyperlinks in documents, inter-relationship among documents can be identi ed. In this paper, we will propose an algorithm to classify web documents into subsets based on hyperl...
متن کامل